Search Result

Select

Deceptive review detection via hierarchical neural network model with attention mechanism

YAN Mengxiang, JI Donghong, REN Yafeng

Journal of Computer Applications 2019, 39 (7): 1925-1930. DOI: 10.11772/j.issn.1001-9081.2018112340

Abstract （483）

PDF （958KB）（403）

Save

Concerning the problem that traditional discrete models fail to capture global semantic information of whole comment text in deceptive review detection, a hierarchical neural network model with attention mechanism was proposed. Firstly, different neural network models were adopted to model the structure of text, and which model was able to obtain the best semantic representation was discussed. Then, the review was modeled by two attention mechanisms respectively based on user view and product view. The user view focused on the user's preferences in comment text and the product view focused on the product feature in comment text. Finally, two representations learned from user and product views were combined as final semantic representation for deceptive review detection. The experiments were carried out on Yelp dataset with accuracy as the evaluation indicator. The experimental results show that the proposed hierarchical neural network model with attention mechanism performs the best with the accuracy higher than traditional discrete methods and existing neural benchmark models by 1 to 4 percentage points.

Reference | Related Articles | Metrics

Select

Joint model of microblog emotion recognition of emoticons and emotion cause detection based on neural network

ZHANG Chen, QIAN Tao, JI Donghong

Journal of Computer Applications 2018, 38 (9): 2464-2468. DOI: 10.11772/j.issn.1001-9081.2018020481

Abstract （793）

PDF （949KB）（728）

Save

As a deep text emotion understanding, emotion cause detection has become a hot issue in the field of text emotion analysis, but current research usually regards emotion cause detection and emotion recognition as two independent tasks, which easily leads to propagation of errors. Considering that emotion cause detection and emotion recognition are interacted, and that the emoticons in the microblog text usually express the emotion of the text, a joint model of emotion cause detection and emotion recognition of emoticons based on Bi-directional Long Short-Term Memory Conditional Random Field (Bi-LSTM-CRF) model was proposed. This model formalizes emotion cause detection and emotion recognition into a unified sequence labeling problem, it makes full use of the interaction between emotion causes and emotions and simultaneously processes the two tasks. The experimental results show that this model achieves the F score as 82.70% in emotion cause detection and 74.74% in emotion recognition of emoticons, compared with the serial model, the F score is enhanced by 5.82% and 17.12%, which means the joint model can effectively reduce propagation of errors and improve the F scores of emotion cause detection and emotion recognition of emoticons.

Reference | Related Articles | Metrics

Select

Product property sentiment analysis based on neural network model

LIU Xinxing, JI Donghong, REN Yafeng

Journal of Computer Applications 2017, 37 (6): 1735-1740. DOI: 10.11772/j.issn.1001-9081.2017.06.1735

Abstract （773）

PDF （897KB）（968）

Save

Concerning the poor results of product property sentiment analysis by the simple neural network model based on word vector, a gated recursive neural network model of integrating discrete features and word vector embedding was proposed. Firstly, the sentences were modeled with direct recurrent graph and the gated recursive neural network model was adopted to complete product property sentiment analysis. Then, the discrete features and word vector embedding were integrated in the gated recursive neural network. Finally, the feature extraction and sentiment analysis were completed in three different task models:pipeline model, joint model and collapsed model. The experiments were done on laptop and restaurant review datasets of SemEval-2014, the macro F1 score was used as the evaluation indicator. Gated recursive neural network model achieved the F1 scores as 48.21% and 62.19%, which were more than ordinary recursive neural network model by nearly 1.5 percentage points. The results indicate that the gated recursive neural network can capture complicated features and enhance the performance on product property sentiment analysis. The proposed neural network model integrated with discrete features and word vector embedding achieved the F1 scores as 49.26% and 63.31%, which are all higher than baseline methods by 0.5 to 1.0 percentage points. The results show that discrete features and word vector embedding can help each other, on the other hand, it's also shown that the neural network model based on only word embedding has the room for improvement. Among the three task models, the pipeline model achieves the highest F1 scores. Thus, it's better to complete feature extraction and sentiment analysis separately.

Reference | Related Articles | Metrics

Select

Twitter text normalization based on unsupervised learning algorithm

DENG Jiayuan, JI Donghong, FEI Chaoqun, REN Yafeng

Journal of Computer Applications 2016, 36 (7): 1887-1892. DOI: 10.11772/j.issn.1001-9081.2016.07.1887

Abstract （722）

PDF （945KB）（327）

Save

Twitter messages contain a large number of nonstandard tokens, created unintentionally or intentionally by people. It is crucial to normalize the nonstandard tokens for various natural language processing applications. In terms of the existing normalization systems which perform poorly, a novel unsupervised normalization system was proposed. First, a standard dictionary was used to determine whether a tweet needs to be normalized or not. Second, a nonstandard token was considered to take 1-to-1 or 1-to- N recovering based on its characteristics. For 1-to- N recovering, the nonstandard token would be divided into multiple possible words using forward and backward search. Third, some normalization candidates were generated for nonstandard tokens among multiple possible words by integrating random walk and spelling checker. Finally, the best normalized twitter could be obtained by taking all the candidates into consideration of n-gram language model. The experimental results on the manual dataset show that the proposed approach obtains F-score of 86.4%, which is 10 percentage points higher than that of current best graph-based random walk algorithm.

Reference | Related Articles | Metrics

Select

Slight-pause marks boundary identification based on conditional random field

MO Yiwen, JI Donghong, HUANG Jiangping

Journal of Computer Applications 2015, 35 (10): 2838-2842. DOI: 10.11772/j.issn.1001-9081.2015.10.2838

Abstract （526）

PDF （786KB）（458）

Save

The boundary identification of punctuation marks is an important research field of natural language processing. It is the basis of the application of word segmentation and phrase chunking. In order to solve the problem that the boundary identification of Chinese slight-pause marks which split the coordinate words and phrases in Chinese, the Conditional Random Field (CRF) that used for sequence segmentation and labeling was adopted for slight-pause marks boundary identification. At first, the slight-puase marks boundary recognition task was described in two types, and then the slight-puase marks corpus tagging method and process and feature selection were studied. According to the methods of corpus recommendation and ten-fold cross validation, a series of experiments were carried out in slight-pause marks. The experimental result shows that the proposed method plays an effective role in slight-pause marks boundary identification with selected boundary identification features. And F-measure of boundary identification increased by 10.57% on baseline as well as the F-measure of words divided by slight-pause marks achieves 85.24%.

Reference | Related Articles | Metrics

Select

Real-time advertising trigger with advertiser behavioral analysis

XIE Zhongqian CHANG Xiao JI Donghong

Journal of Computer Applications 2014, 34 (9): 2566-2570. DOI: 10.11772/j.issn.1001-9081.2014.09.2566

Abstract （316）

PDF （770KB）（533）

Save

In the process of advertising on search engines, it needs to calculate the correlation between auction word (Bidword) and user's query (Query) in real time. Dynamic Term weight in advertisements and phrase commercial value assessment must be considered in relevant calculation. Thus, a phrase related calculation approach named ADPCB was proposed based on behavioral analysis and Continuous Bag-Of-Words (CBOW) model to deal with those problems. Firstly, this approach got vector of each Term by CBOW. Secondly, to analyze advertiser's behavior and construct a global empowerment tree about phrases, the phrase structure was analyzed to obtain dynamic Term weight. Finally the phrase distributed representation produced by Term weight and linear combination was applied to the related measurement between Bidword and Query. Experiments were conducted on 10000 pairs Query and Bidword (positive and negative ratio is 1∶〖KG-*2〗1) with editorial judgments by using Word2vec, ADPCB performed better than Term Frequency-Inverse Document Frequency (TF-IDF) which combined with CBOW; when the accuracy was 0.70, ADPCB got higher recall than that of Latent Dirichlet Allocation (LDA), BM25 (Best Match25) and TF-IDF. The experimental results and analysis show that ADPCB can recognize the commercial value quality of the phrase to reduce the quantity of advertising trigger of low commercial value Query, it can be used in real-time calculation scene.

Reference | Related Articles | Metrics

Select

Summary extraction of news comments based on weighed textual matrix factorization and information entropy model

GUO Yujing JI Donghong

Journal of Computer Applications 2014, 34 (10): 2859-2864. DOI: 10.11772/j.issn.1001-9081.2014.10.2859

Abstract （281）

PDF （889KB）（394）

Save

This paper addressed to select the most interesting and useful comments for an online news article. In summary of comments for news extraction problem, a new way was introduced, and it was proved to be effective in the social media comments automatic extraction with the combination of Weighed Textual Matrix Factorization (WTMF) and information entropy. The construction of information for tweets and news was based on heterogeneous graph WTMF model which solved the sparse problems of short text and maintained the similarity of information. Meanwhile, according to tweet character distribution, binary entropy and continuous entropy were built to guarantee the diversity of information.Last, according to the characteristics of submodularity, a greedy algorithm was designed to get an approximate optimal solution for the optimization problems. The experimental results show that, the method with combination of WTMF and information entropy can improve the extraction performance of summary of comments for social media effectively. The recall rate and F1 value on the ROUGE2 respectively reaches 0.40074 and 0.27330，which is increased by 0.05 and 0.03 in comparison of the Latent Dirichlet Allocation (LDA) extended model—Biterm Topic Model (BTM). The proposed model improves the quality of news summary of comments effectively.

Reference | Related Articles | Metrics